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A NUMERICAL EMPIRICAL BAYES PROCEDURE FOR FINDING AN INTERVAL ESTIMATE 



Summary 



A numerical procedure is outlined for obtaining an interval estimate 
of a parameter in an empirical Bayes estimation problem. The case where 
each observed value x has a binomial distribution, conditional on a 
parameter £ , is the only case considered. For each x , the parameter 
estimated is the expected value of £ given x . The main purpose is to 
throw some light on the effectiveness of empirical Bayes estimation in 
samples of various sizes. Illustrative numerical results are presented. 

1. Introductory Numerical Example 



The following example illustrates a type of situation in which the 
usual estimator for a binomial parameter can be very misleading. A sample 
of n = 10 independent observations was drawn from a Bernouilli distri- 
bution with probability of success it . This was repeated for N = 10,000 
different values of it , these being obtained by random sampling from a 
prechosen distribution of it's. The sample proportion p of successes 
was found for each of the 10,000 samples. The resulting sample distribution 
Nf(p) of the 10,000 values of p is shown below: 
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statistic if the sample of 10 is considered by itself. For even values of 
lOp , the table also shows interval estimates (it, it) of the regression of 
it on p , obtained by the empirical Bayes approach to be described in this 
paper. 

The purpose of the above numerical example is to display the disagree- 
ment between the usual estimates p of rt and the empirical Bayes interval 
estimates shown. The reason for such sharp disagreement is that the popula- 
tion distribution of jf. chosen to generate these artificial data, until now 
concealed from the reader, actually had a negligible standard deviation so 
that Nf(p) is in effect a random sample of 10,000 from a binomial distri- 
bution with it = 0.5 • Although this information was not available for 
estimation purposes, the empirical Bayes estimation procedure partially 
recovered equivalent information from the sample Nf(p) • A pleasing 
result is that each interval estimate here turns out to include the true 
value it = 0.5 • 



2. Mathematical Formulation 

Suppose that for each randomly drawn observation, there is not only 
an observed value x of the discrete random variable X , but also a 
corresponding unobservable value £ of the continuous random variable 
Z . For any given £ , x is drawn at random from the binomial distri- 
bution 



h(xlO = Prob(X = x|0 = (£)£ X (1 - 0 n ‘ X , 
where n is known. Given randomly drawn observations 



x — 0, 1^ • • • j n y 
^l** *^2 * * * * * > the 



a) 
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£ corresponding to any particular observation x can be estimated by 

Q. 

empirical Bayes point -estimation methods (for example, Robbins, 1956; 

Maritz, 1966; Copas, 1969; Griffin & Krutchkoff, 1971)* More needs to be 
known about the effectiveness of such methods where n and N are not 
both very large. 

Denote by G(£) the unknown cumulative distribution function of Z 
for the population from which the unobservable values are drawn. In view 
of (l), it will be assumed that the range of £ is 0 < £ < 1 . Ordinarily, 
G(£) is thought of as continuous, but we will not exclude the possibility 
that it is a step function. 

The (unconditional) probability distribution of X for the population 
can be written 

1 

<t> G (x) = J h(x|£) dG(0 , x = 0, 1, . . ., n . (2) 

0 

The observed sample distribution of x^,x^, ...,x^ , to be denoted by f(x) , 
is the distribution of a random sample from <t> (x) . 

ll 

If G(£) were known, the value of £ corresponding to any observed 
x would in common practice be estimated by the regression function 

1 

H z | x s e G (z|x) = J £ h(xls) *5(0 , x = 0,1, » ( 3 ) 

G 0 

usually called the Bayes estimator. When G(£) is unknown, as here, the 

A 

problem of finding n point estimator £ for the £ corresponding to an 
observed x is a standard empirical Bayes problem. Typically, a point 

7 
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estimate (fL | , say) of e(z|x ) , the regression of Z on x evalu- 

a 

ated at x & , is used as the empirical Bayes estimator of £ for observa- 
tion a . The binomial case considered here is of particular interest 
(as compared, for example, to the Poisson) since in the binomial case G(£) 
is unidentifiable--complete knowledge of <t> (x) is sufficient to determine 

Lj 

only the first n moments of G((;) (Skeilam, 1948). 

The present note is explicitly concerned with an interval estimator 

of , x = 0,1, ...,n , G(£) being unknown. Given the 

empirical Bayes model already specified and a sample with the observed 

distribution f(x) , what range of values for p._ i , x = 0,1, ...,n , 

is reasonably consistent with the observed f(x) ? 

Consider the set T consisting of all distribution functions G(£) 

2 

such that the chi square ( X n ) between <t> _(x) , defined by (2), and the 

2 

given f(x) is less than X , the 1 - Ct percentile of the chi square 
distribution: 



n [f (x) - 4> g (x)] 2 



( 4 ) 



For each possible value of x , find jl^ ■ Max^^) and |i^ a Min((i2| x ) > 

r a r a 

the maximum and minimum values of (3) in . For any given x , any value 

of |i_ | in the interval (p^ ,p ) is consistent with the data; values 
Li I X -v*X cxx 

outside this interval will be considered implausible. 

The foregoing is not an ideal way to set up an interval estimate. 

Until better methods are implemented, however, it can throw some light on 
the accuracy of empirical Bayes point estimation. For the standard empirical 

A 

Bayes problem of estimating £ , any point estimate £ in the interval 
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^Ox^abc^ would minimize the squared errors of estimation for some G(£) 



in r a . Any estimate outside this interval would not. 

If desired, the interval described above can be interpreted as a con- 
fidence interval for i with confidence level > 1 - OL . This may be 

I X 

seen as follows. For any given G(£) , is a random variable so de- 

fined as to include G(£) 1 - OL of the time. Thus (y^ , (I ) must 
include the true q i at least 1 - OL of the time. 

LJ I X 

3« Bounds for the Regression of Z on x 
Substituting (l) in (2) and expanding, we have 



where y* is the moment about the origin of order s for the distribution 
s 

G(£) . Ruling out the degenerate case where Z takes no values other than 



1 




0 




x = 0, 1, • • • , n , ( 5 ) 



0 or 1 (and where consequently <t>(x) = 0 unless x = 0 or n ), we have 

that y' >0 for all s . Similarly, from (3), 
s 




x = 0, 1, . . ., n 



( 6 ) 
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Consider first a restricted case where pj, . . . , are fixed. In 
this special case, (6) is seen to be maximized (minimized) by maximizing 
(minimizing) ^n+1 if n - x is even, by minimizing (maximizing) p^ + ^ if 
n - x is odd. A theorem of Markov (see Posse, 1886, sections V8 and V9; or 
Karlin & Shapley, 1953) shows that if p^,p^, ...,p^ (considered fixed) are the 
moments of some frequency distribution, then the maximum (minimum) value of 
^n+l is uni( l uel y attained when G(£) is a certain specified step function. 

If n is even, all the frequency is concentrated at exactly n/2 + 1 dif- 
ferent values of £ , including £ = 1 if is maximized, £ = 0 if 

is minimized. The situation for odd n is similar, but need not be 
detailed here. 

This leads to the key conclusion that in order to find G ( £ ) maximizing 
or minimizing (6) for fixed x , when are given , it is only 

necessary to determine M 5 n/2 unknown values £q, . . . , £ or 

^1* ^ 2 * ’ t0 § ether ^i^ 1 corresponding M unknown frequencies 

^0'®!* * * *'®M-1 or ®1 , ®2 , *** , %! * n °f- necessary to admit to consid- 

eration any of the continuous frequency distributions on (0,l) nor any 
discrete distribution with more than M unknown values of £ . 

In our actual problem, we wish to find the probability distribution 
, say, maximizing (6) or the g^, x (£) minimizing (6) subject to the 
restriction that the corresponding cumulative distribution function is in 
r a * This restriction does not change the key conclusion stated above, 
since holding p^,p£, fixed also holds * G (0),* G (l),...,Un) 

fixed, because of (5), and thus fixes . Thus the extremizing g(£) 
will still be a step function with exactly M + 1 different values of £ 9 
as before. (For example, let p£,p^, • • «, p^ be the first n moments of 



-7- 



. Markov* s theorem states that the minimizing G(£) with these mo- 
ments is of* the special form described. Thus if ^ exists, it is of 
this special form.) 

It is, of course, always possible that is empty. This situation 

has not yet arisen in practical application. The smaller the value of 0! 
chosen, the less likely it is that r will be empty. 

For a discrete G(£) such as the first line of (5) can be 

written 



M 

♦<*) ■ G> E n 1 - V 

m=0 



n-x 



(7) 



where either = 0 or = 0 . Similarly, (3) becomes 

rn\ M 



|x 4xj S 0 ^m ^ ^ 

' ' ra =0 



n-x 



( 8 ) 



Formulas derived by Markov (not given here) provide the explicit solu- 
tion (if any) to the extremization problem when |i*, ^ are fixed. 
These formulas do not help with the more general problem to be solved here, 
which seems to require the numerical methods of mathematical programming. 

4. Numerical Methods 

Note that the use of a step function for G(£) here is required by 
the problem itself, not imposed for the convenience of the writer. For ’ 
simplicity, the following discussion deals only with minimization. 
Maximization is essentially similar. 

11 



Thanks to Markov, our problem is now to find and , 
m = 0,1, ...,M - 1 or m = 1,2, ...,M , so as to minimize (8), subject to 
the restrictions 

0 < t m < 1 . ^ > 0 , and 2^ < 1 , (9) 

and also subject to 00. The problem thus stated can be solved numerically 

for any given set of data by mathematical programming techniques. 

Actually, the inequality ( < ) restriction in (4) can without loss of 

generality be replaced by equality. A proof is given in the appendix for 

situations where at least two or three of the g^ are nonzero. 

The writer is indebted to Martha Hamilton who developed the computer 

program to carry out the required minimizations and maximizations numerically 

for various sets of data. The program implements a sequential unconstrained 

minimization technique (SUMT) of Fiacco and McCormick (1968, chapter 4). 

2 

The constraint on X was handled by use of a penalty function; other 
constraints were dealt with by simpler means. The unconstrained minimiza- 
tions required for SUMT were carried out by a program developed by 
JiJreskog (1967* section 8) and modified by Hamilton, implementing the 
Fletcher -Powell-Davidon (1963) method. 

All computer runs were made in double precision on an IBM 360/65* 

As a check, each of the extremization problems dealt with in Table 1 
was run with two different starting points, one of which was completely 
random within the limitations 0 < g^ and 0 < ^ < 1 ( m = 0,1, ...,M ) 
and Zg^ = 1 . When N = 12,990 , the agreement between the p^ or p^ 
reached from two different starting points was to at least four decimal 
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places; when N = 130 , the agreement was to at least three decimal places. 
This suggests that the intervals obtained represent global, not just local, 
maxima and minima, at least to a three-decimal-place approximation. 

Hundreds of* other checks were made to be sure that small changes in 

or in ^ x (0 would not give more extreme ^ x or p ax , respec- 
tively. All such checks were satisfied. 

Numerical Results 

The procedure described was applied to the real data shown in the 
second column of Table 1. This column shows the frequency distribution 
of N = 12,990 independent observations (actually, test scores of 12,990 
students — their number of correct answers on a psychological test composed 
of n = 20 questions). A separate study (see below) shows that this 
distribution is compatible with the mathematical model given by (l) and 
( 2 ). 

The fifth column shows interval estimates of (i^| x , obtained for these 
data by the method outlined, with a = .05 • These empirical Bayes intervals, 
unlike ordinary interval estimates, are wider at extreme values of x than 
at middle values. In the middle, the intervals shown for N = 12,990 are 
happily short. 

Although it is not obvious from a look at the table, no straight-line 
regression of £ on x can be fitted inside the intervals shown for 
N = 12,990 • The indicated nonlinearity is not rigorously demonstrated ' 
by the methods described here. However, linearity of regression implies 
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Table 1. Observed frequency distributions f(x) and corresponding interval 
estimates ( Q! = .05 ) for the regression of Z on x . 



X 


Nf(x) 




P z|x 


Interval Estimate of n^| x 




N = 12,990 


N = 130 




N = 12,990 


N = 130 


20 


63 


2 


.898 


.852 -.952 


.690-1.000 


19 


l4l 


2 


.863 






18 


220 
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.823 


.780 -.846 


.615 -.943 


17 
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2 
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.716 


.690-. 743 


•573-. 855 
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.619 


.605 -.646 


.510-. 749 


13 
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4 


.583 
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1203 
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•553 


•534-. 564 


.440-. 636 
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1443 


19 


.526 
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1550 


13 


.500 


.485 -.511 


.404 -.556 
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1409 


12 


.475 


.461-. 487 




8 


1235 


13 


.451 


.436-. 463 


.363 -.491 
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18 


.426 
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. hQ2 
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.305 -*491 
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.199 -*491 
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• 334 
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27 
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.314 


.233-. 383 


.049 -.491 
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12 
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.296 


.107-. 380 
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.280 


.010-. 37^ 


.000 -.491 
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that ^q( x ) i s a negative hypergeometric distribution (Lord & Novick, 

1968 , section 25 * 6 ); thus linearity can be tested by determining whether 
f(x) can be considered a random sample from such a distribution. 

In order to investigate the effects of sample size, a random sample 
of 130 observations was drawn f'rom the 12,990* The resulting f(x) is 
shown in the table along with the corresponding interval estimates of 
^z|x * ^ n ^ erva ^ s are °T course much wider than for N = 12,990 , but 

fortunately not 10 times as wide. (Grouping of frequencies was used in the 
calculations where necessary so that the denominator of (k) should never 
be less than 1.) 

The fourth column of the table is included as a partial check on the 
validity of the intervals obtained. This column shows the regression of 
£ on x corresponding to a certain G(£) which was found (in a separate 
study) to provide a good fit to the f(x) in column two, the obtained 
chi square between ♦ (x) and f(x) being near the 50 th percentile 

U 

of the tabled distribution of chi square for 20 degrees of freedom. It 
is pleasant to find that these regression values all lie well within the 
interval estimates shown in column five. 

This research was supported in part by the Office of Naval Research. 
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It will be shown here that Ba x tt) , the G(0 that minimizes (6) 

2 2 

or (8) subject to (k) and (9) must lie on the boundary where X = X . 

To avoid tediousness, the proofs are limited to situations where at least 
two or three (as convenient) of the are distinct and have nonzero 

frequency. 

It will be shown first that ji^| x > considered as a function of the 

a. and the t . has no unrestricted minimum satisfying (9). This is proved 
m m 

by showing that if we are given any set with K > 2 of g^ and satis- 
fying (9), we can reduce the computed from (8) by a change in the g m 

and 5^ that does not violate (9). Thus if a restricted minimum with K > 2 
is found by the methods of section 4, it must lie on the only remaining bound- 
ary ( = X^ ), imposed by restriction (4), since otherwise ^ j x could 

still be reduced by the methods outlined. We will treat explicitly only 
the case where x is even. Thus t 0 = 0 b y Markov* s theorem. (Similar 
proofs will apply when x is odd.) Let £. < ... < be bbe K > 2 
distinct nonzero values at which g^ (£) has nonzero frequency. 

All possibilities will now be covered by the following four cases. 

Case 1 : g n >0, 0<x<n. 

of (8) with respect to £. : 

a [x ♦ 1 - (n + l)y E ^(1 - ? m r X 

m=u 

- »Ci> l ejzHi - tjr* 

m=0 






Consider the derivative 
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K 



• = 8 m ^< 1 - ^) n ' X Ui(l - E t ) + (Ei - E m )(x - nj )] . (Al) 

ra=u 



Take i = 0 and suppose for the moment that K > 2 and that £q (= 0) 
is replaced by a very small positive quantity. The first term in the 
brackets can be neglected. The second term in the brackets is zero when 
m = 0 but is negative otherwise. Since all quantities outside the brackets 
are nonnegative and some, at least, are positive, the derivative (Al) will 



be negative . Thus 



^1: 



can be reduced by replacing by a small posi- 



tive quantity (this can also be seen intuitively). 



Case 2: 



go = 0 , 0 < 



x < n . 



Consider the effect on Mr£| x a sinfl ^ change in g^ and a small 
compensating change in g^ , holding all other g^ fixed. Treating g^ 
as a function of g^ defined by the equation 

K-l 



(A2) 



and using the formula 



dp 



dp, 



dp. 
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we find from (8) that when the g^ (m ^ i,K) are fixed, 
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£k(i - U"'*] 



3 K 
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« E 
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(Alt) 



The last expression in (A^) makes use of the fact that since g Q = 0 for 
Case 2, any summation in (A4) can be written either including or excluding 
m = 0 . 

The second term in the brackets is never positive. Now take i = 1 . 
The first term in brackets is now zero when m = 1 and negative otherwise. 
All quantities outside the brackets are nonnegative; if either (; < 1 , 

or if K > 3 , then some of these quantities are positive. If so, the 



derivative of (A4) is necessarily negative for i = 1 . Thus 
be reduced by shifting some frequency from L to t . 



|: 



can 



(if = 1 ar.d K = 2 , all terms under the summation in (A^) are 
zero. This special case can be dealt with by using (Al) again with i = 1 . 
Since g^ = 0, 1-^=0, and = 0 , the last expression in 
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. This derivative of 



-A4- 



(Al) simplifies to ^Ix^l* ^l^ 1 " ^i) 2n “ 2x 
l-i^ | x is necessarily positive for x < n . Thus | x can reduced by 

decreasing . ) 



Case 3 : x = 0 

In this case (8) becomes 



M 



Jo ^ (1 ' ^ 

zix M _ 

i «.» • <.> 



Using (A2) and (A3) as before, but with i = 0 , we find, since = 0 , 



that 



M 



“ Jo ^ (1 ' ^ ' 5 K (1 • ^ 



m=0 



” ^(1 - ? m ) n t - ^ - «K> “ - «. + ? m (l - 



m=0 



M 



z ^(i - t m n - 1. - (? K - w - ^ ■ 



(A5) 



m=0 



Since (A5) is always negative, an increase in gQ together with a cor- 
responding decrease in g^ will reduce M£| x 
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in. Case 3* 
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Case 4: 



x = n 



In this case 



Z x 



m 



M 

£ „n+l 

1=1 «iA, 



m 



M 

2 ^ 
m=l 



n 

’m 



Using (A2) and (A3) as before with i = 0 , we have 
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if a t n ( l' n+1 ^ 4. t *n+l,.,n» 

* 2 <W' E K > + E . 



‘zU 

dg r 



m=l 



m=l 



M 



4 E «■£«. - y 



(AS) 



m=l 



Since (A6) is always negative, an increase in g Q together with a com- 
pensating decrease in g K will reduce H z | x in Case 4. 

The four cases listed are exhaustive, provided K > 2 . Similar proofs 
could be written to eliminate this proviso. Consequently, a minimum for 
cannot occur except on the boundary established by (4). 
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